NMFk analysis: Geothermal data of Brady site, NV

This analysis demonstrates how NMFk can be applied to perform unsupervised machine-learning analyses.

The code below demonstrates the ML work related to a submitted research paper analyzing geothermal data of the Brady site, NV

Import required Julia modules

If NMFk is not installed, first execute import Pkg; Pkg.add("NMFk"); Pkg.add("DelimitedFiles"); Pkg.add("Gadfly"); Pkg.add("Mads").

Read and pre-process the dataset

Setup the working directory containing the Brady site data

Load the data file

Populate the missing well names

Set up missing entries to be equal to zero

Define names of the data attributes (matrix columns

Short names are used for coding.

Long names used for plotting and visualization.

Define the attributes that will be processed

Index the attributes that will be processed

Show information about the processed data (min, max, count):

Get well locations and production

Define well types

Show information about processed well attributes

Collect the well data into 3D tensor

Tensor indices (dimensions) define depths, attributes, and wells.

Define the maximum depth

The maximum depth limits the depth of the data included in the analyses.

The maximum depth is set to 750 m.

Normalize tensor slices associated with each attribute

Define problem setup variables

Plot well data

A HTML file named "map/dataset-set00-v9-inv.html" is generated.

Link to "map/dataset-set00-v9-inv.html".

The file provides interacive visualization of the data; it can be openned in any browser.

A static (PNG) version of the map looks like this:

Perform ML analyses

For the ML analyses, the data tensor will be flatten two different ways:

Type 1 flattening: focus on locations

Flatten the tensor into a matrix

Matrix rows merge the depth and attribute dimensions.

Matrix cols represent the well locations.

Perform NMFk analyses

Here the NMFk results are loaded from a prior ML runs.

As seen from the output the ML analyses identified that the optimal number of geothermal signatures in the dataset 6.

Solutions with a number of signatures less than 6 are underfitting.

Solutions with a number of signatures greater than 6 are overfitting and unacceptable.

The set of accetable solutions are defined as follows:

The accceptable solutions contain 2, 5 and 6 signatures.

Post-process NMFk results

Plot representing solution quality (fit) and silhouette width (robustness) for different number of sigantures k:

The ML solutions containing 2, 5 and 6 signatures are further analyzed as follows:

Type 2 flattening: focus on attributes

Flatten the tensor into a matrix

Matrix rows merge the depth and well locations dimensions.

Matrix cols represent the well attributes.

Perform NMFk analyses

Here the NMFk results are loaded from a prior ML runs.

As seen from the output the ML analyses identified that the optimal number of geothermal signatures in the dataset 3.

Solutions with a number of signatures less than 3 are underfitting.

Solutions with a number of signatures greater than 3 are overfitting and unacceptable.

The set of accetable solutions are defined as follows:

The accceptable solutions contain 2 and 3 signatures.

Post-process NMFk results

Plot representing solution quality (fit) and silhouette width (robustness) for different number of sigantures k:

The ML solutions containing 2 and 3 signatures are further analyzed as follows: